Search results for "Variable selection"
showing 10 items of 24 documents
Evaluation of the effect of chance correlations on variable selection using Partial Least Squares -Discriminant Analysis
2013
Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than t…
Variable selection in the analysis of energy consumption-growth nexus
2015
There is abundant empirical literature that focuses on whether energy consumption is a critical driver of economic growth. The evolution of this literature has largely consisted of attempts to solve the problems and answer the criticisms arising from earlier studies. One of the most common criticisms is that previous work concentrates on the bivariate relationship, energy consumption–economic growth. Many authors try to overcome this critique using control variables. However, the choice of these variables has been ad hoc, made according to the subjective economic rationale of the authors. Our contribution to this literature is to apply a robust probabilistic model to select the explanatory …
Induced smoothing in LASSO regression
The thesis is being carried out with the National research Council at the Institute of Biomedicine and Molecular Immunology "Alberto Monroy" of Palermo, where I am a fellow, under the supervision of MD Stefania La Grutta. Our research unit is focused on clinical research in allergic respiratory problems in children. In particular, we are interested in to assess the determinants of impaired lung function in a sample of outpatient asthmatic children aged between 5 and 17 years enrolled from 2011 to 2017. Our dataset is composed by n = 529 children and several covariates regarding host and environmental factors. This thesis focuses on hypothesis testing in lasso regression, when one is interes…
Variable selection with unbiased estimation: the CDF penalty
2022
We propose a new SCAD-type penalty in general regression models. The new penalty can be considered a competitor of the LASSO, SCAD or MCP penalties, as it guarantees sparse variable selection, i.e., null regression coefficient estimates, while attenuating bias for the non-null estimates. In this work, the method is discussed, and some comparisons are presented.
Geographic mosaic of selection by avian predators on hindwing warning colour in a polymorphic aposematic moth
2020
AbstractWarning signals are predicted to develop signal monomorphism via positive frequency-dependent selection (+FDS) albeit many aposematic systems exhibit signal polymorphism. To understand this mismatch, we conducted a large-scale predation experiment in four locations, among which the frequencies of hindwing warning coloration of aposematic Arctia plantaginis differ. Here we show that selection by avian predators on warning colour is predicted by local morph frequency and predator community composition. We found +FDS to be strongest in monomorphic Scotland, and in contrast, lowest in polymorphic Finland, where different predators favour different male morphs. +FDS was also found in Geo…
Model uncertainty and variable selection: an application to the modelization of FDI determinants in Europe
2019
Las últimas décadas han visto un interés cada vez mayor en la IED, y un debate creciente sobre su modelización en términos de las variables consideradas como sus determinantes, la especificación del modelo y los métodos de estimación del modelo de gravedad de la IED. Esto se debe a la incertidumbre que rodea tanto las teorías como los enfoques empíricos de la IED. Esta Tesis doctoral tiene como objetivo contribuir a la literatura mediante la investigación de las fuerzas impulsoras de las actividades de las EMNs hacia y desde los países europeos, tanto a nivel regional como nacional, abordando los problemas de selección de variables e incertidumbre del modelo que se enfrentan al modelizar la…
Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models
2013
Summary Sparsity is an essential feature of many contemporary data problems. Remote sensing, various forms of automated screening and other high throughput measurement devices collect a large amount of information, typically about few independent statistical subjects or units. In certain cases it is reasonable to assume that the underlying process generating the data is itself sparse, in the sense that only a few of the measured variables are involved in the process. We propose an explicit method of monotonically decreasing sparsity for outcomes that can be modelled by an exponential family. In our approach we generalize the equiangular condition in a generalized linear model. Although the …
Scad-elastic net and the estimation of individual tourism expenditure determinants
2014
This paper introduces the use of scad-elastic net in the assessment of the determinants of individual tourist spending. This technique approaches two main estimation-related issues of primary importance. So far studies of tourism literature have made a wide use of classic regressions, whose results might be affected by multicollinearity. In addition, because of the absence of robust economic theory on tourism behavior, regressor selection is often left to researcher's choice when not driven by non-optimal automatic criteria. Scad-elastic net is an OLS model that accounts for both these problems by including two types of parameters constraints, namely the smoothly clipped absolute deviation …
Estimation of sparse generalized linear models: the dglars package
2013
dglars is a public available R package that implements the method proposed in Augugliaro, Mineo and Wit (2013) developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method (LARS). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve; specifically a predictor-corrector algorithm and a cyclic coordinate descent algorithm.
Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter
2018
A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the $$\ell _1$$ or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this…